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Abstract — For a long time, detection and parameter estimation 
methods for signal processing have relied on asymptotic statistics 
as the number n of observations of a population grows large 
comparatively to the population size TV, i.e. n/N — > oo. Modern 
technological and societal advances now demand the study 
of sometimes extremely large populations and simultaneously 
require fast signal processing due to accelerated system dynamics. 
This results in not-so-large practical ratios n/N, sometimes 
even smaller than one. A disruptive change in classical signal 
processing methods has therefore been initiated in the past ten 
years, mostly spurred by the field of large dimensional random 
matrix theory. The early works in random matrix theory for 
signal processing applications are however scarce and highly 
technical. This tutorial provides an accessible methodological 
introduction to the modern tools of random matrix theory and 
to the signal processing methods derived from them, with an 
emphasis on simple illustrative examples. 



I. Introduction 

Most signal processing methods, e.g. statistical tests or 
parameter estimators, are based on asymptotic statistics of n 
observed random signal vectors |1). This is because a deter- 
ministic behavior often arises when n — > oo, which simplifies 
the problem as exemplified by the celebrated law of large 
numbers and the central limit theorem. With the increase of the 
systems dimension, denoted by N, and the need for even faster 
dynamics, the scenario n N becomes less and less likely 
to occur in practice. As a consequence, the large n properties 
are no longer valid, as will be discussed in Section [TT] This 
is the case for instance in array processing where a large 
number of antennas is used to detect incoming signals with 
short time stationarity properties (e.g. radar detection of fast 
moving objects) (2). Other examples are found for instance 
in mathematical finance where numerous stocks show price 
index correlation over short observable time periods [3| or 
in evolutionary biology where the joint presence of multiple 
genes in the genotype of a given species is analyzed from a 
few DNA samples j4). 

This tutorial introduces recent tools to cope with the n ~ N 
limitation of classical approaches. These tools are based on the 
study of large dimensional random matrices, which originates 
from the works of Wigner |3J in 1955 and, more importantly 
here, of Marcenko and Pastur J6j in 1967. Section III presents 



a brief introduction to random matrix theory. More precisely, 
we will explain the basic tools used in this field which differ 
greatly from classical approaches, based on which we will in- 
troduce statistical inference tools in the n ~ N regime. These 
tools use random matrix theory jointly with complex analysis 



methods [7| and are often known as G-estimation, named after 
the G-estimator formulas from Girko (8). Since the techniques 



presented in Section III are expected to be rather new to 
the non-expert reader, we will elaborate on toy examples 
to introduce the methods rather than on a list of theoretical 
results. More realistic application examples, taken from the 



still limited literature, are then presented in Section IV These 



will further familiarize the reader with the introduced concepts. 
These examples span from signal detection in array processing 
to failure localisation in large dimensional networks. Finally, 
in Section [V] we will draw some conclusions and provide 
future prospects of this emerging field. 

Notations: Boldface lowercase (uppercase) characters stand 
for vectors (matrices), with Ijy the N x N identity matrix. 
The notations (-) T and (-) H denote transpose and Hermitian 
transpose, respectively. The value \f^\ is denoted by i. The 
notation E is the expectation operator. The notations -^-4 
and denote almost sure and weak convergence of random 
variables, respectively. The symbol a 2 ) indicates a real 
Gaussian distribution with mean /i and variance a 2 . The norm 
I II will be understood as the Euclidean norm for vectors and 
the spectral norm for Hermitian matrices (||X|| = max; |A^| for 
X with eigenvalues Ai, . . . , \n)- The Dirac delta is denoted 



5{x). Finally, {x) 



max(x, 0) for real x, l x < y is the 



function of x equal to zero for x > y and equal to one for 

x <y. 

II. Limitations of classical signal processing 
We start our exposition with a simple example explaining 

the compelling need for the study of large dimensional random 

matrices. 



A. The Marcenko -Pastur law 

Consider y = Tsx e C N a random variable with x 
having complex independent and identically distributed (i.i.d.) 
entries with zero mean and unit variance. If yi, . . . ,y„ are 
n independent realizations of y, then, from the strong law of 
large numbers, 
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as n — > oo. For further use, we will denote Y = [yi, . . . 
with entry Yij, and realize that with this notation, 
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Here T is often called the sample covariance matrix, 
obtained from the observations yi , . . . , y n of the random 
variable y with population covariance matrix E[yy H ] = T. 
The convergence ([1} suggests that, for n sufficiently large, T 
is an accurate estimate of T. The natural reason why T can 
be asymptotically well approximated is because T originates 
from a total of Nn ^> N 2 observations, while the parameter 
T to be estimated is of size TV 2 . 

Suppose now for simplicity that T = Ijy- If the relation 
Nn ^ N 2 is not met in practice, e.g. if n cannot be taken 
very large compared to the system size N, then a peculiar 
behaviour of T arises, as both N, n — > oo while N/n does 
not tend to zero. Observe that the entry of T is given 
by 

n 
k=l 

Since T = Ijv, E[Fi/ c Y J * fc ] = 5ij. Therefore, the law of large 
numbers ensures that [T]^ — > if i ^ j, while [T]u — > 1, 
for all Since N grows along with n, we might then say 
that T converges point-wise to an "infinitely large identity 
matrix". This stands, irrelevant of N, as long as n — > oo, so 
in particular if N = 2n — > oo. However, under this hypothesis, 
T is of maximum rank N/2 and is therefore rank-deficientM 
Altogether, we may then conclude that T converges point-wise 
to an "infinite-size identity matrix" which has the property 
that half of its eigenvalues equal zero. The convergence ([T| 
therefore clearly does not hold here as the spectral norm (or 
the absolute largest eigenvalue) of T — Ijv is greater or equal 
to 1 for all N. 

This convergence paradox (convergence of T to an identity 
matrix with zero eigenvalues) arises obviously from an in- 
correct reasoning when taking the limits N, n — > oo. Indeed, 
the convergence of matrices with increasing sizes only makes 
sense if a proper measure, here the spectral norm, in the space 
of such matrices is considered. The outcome of our previous 
observation is then that the random (or empirical) eigenvalue 
distribution F T of T, defined as 
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with Ai,...,Ajv the eigenvalues of T, does not converge 
(weakly) to a Dirac mass in 1, and this is all what can be said 
at this point. This remark has fundamental consequences for 
basic signal processing detection and estimation procedures. 
In fact, although the eigenvalue distribution of T does not 
converge to a Dirac in 1, it turns out that in many situations 
of practical interest, it does converge towards a limiting 
distribution, as N,n — » oo, N/n — > c > 0. In the particular 
case where T = Ijv, the limiting distribution is known as 
the Marcenko-Pastur law, initially studied by Marcenko and 
Pastur in (6). This distribution is depicted in Figure [T] and 
compared to the empirical eigenvalue distribution of a matrix 
T of large dimensions. From this figure, we observe that the 
empirical distribution of the eigenvalues is a good match to 

'indeed, T = -j™ Y]^[/^ YiY^ which is the sum of N/2 rank-1 matrices. 
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Fig. 1. Histogram of the eigenvalues of a single realization of T = 
~Efe=iyfcyfc' yk nas proper complex Gaussian entries, for n = 2000, 
N = 500. 



the limiting distribution for N, n large. It is also observed that 
no eigenvalue seems to lie outside the support of the limiting 
distribution even for finite N, n. This fundamental property 
can be proved to hold for any matrix Y with independent 
zero mean and unit variance entries, along with some mild 
assumptions on the higher order moments. The shape of the 
Marcenko-Pastur law for different limiting ratios c is depicted 
in Figure [2] This figure suggests that, as c — > 0, the support 
of the Marcenko-Pastur law tends to concentrate into a single 
mass in 1. This is in line with our expectations from the 
discussion above for finite N, while the support tends to spread 
and eventually reaches when c is large, which is compliant 
with the existence of a mass of eigenvalues at when N > n, 
i.e. c > 1. This will be confirmed by the explicit expression 
of the limiting density given later in Equation [7] 

The general tools employed to derive the Marcenko-Pastur 
law and its generalizations to some more advanced random 



matrix models are briefly introduced in Section III 



The main implication of the above observations on signal 
processing methods is that, in general, the tools developed in 
view of applications for small N and large n are no longer 
adequate if either N is taken much larger, or if n cannot be 
afforded to be large. Observe for instance in Figure [2] that, 
even for n being ten times larger than TV, the spread of the 
Marcenko-Pastur law around 1 is significant enough for the 
classical large n assumption to be quite disputable. In the 
remainder of this section, we introduce a practical inference 
problem which assumes the setup n 3> N and which can 
be proved asymptotically biased if N/n does not converge to 



B. An eigenvalue inference problem 

The following example deals with the statistical inference of 
eigenvalues of a large population covariance matrix. This prob- 
lem is very generic and finds applications in array processing, 
such as radar detection, where the objective is to enumerate 
the sources and to infer their distances to the radar, or in 
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Fig. 2. Marcenko-Pastur density / c for different limit ratios c = lim N/n. 



cognitive radios where the objective is to detect and estimate 
the power of concurrent signal transmissions. This example 
will be used throughout the article to demonstrate the transition 
from the classical signal processing considerations towards the 
advanced random matrix approaches. 

We assume that one has access to successive independent re- 
alizations yi, . . . ,y„ of a stationary process y t = Ux t G C N , 
with U e £NxN an unsown unitary matrix and x t G C N 
a complex circularly symmetric Gaussian vector with zero 
mean and covariance matrix P. We further assume that P 
is diagonal and composed of Ni eigenvalues equal to Pi, for 
1 <i <K, with Pi < . . . < P K , and where N = Y,f=i N i- 
The presence of the unknown matrix U translates the fact 
that, in many applications, the eigenvector structure of the 
population covariance matrix T = UPU H of y t is not 
necessarily known. Hence, the diagonal entries of P are not 
directly accessible. Note that this is in sharp contrast to 
covariance matrix estimation of structured signals, as in the 
recent works |9), p0] where population covariance matrices 
have a Toeplitz-like structure. Here, no such assumption on 
U is made, so that alternative techniques need to be used in 
order to estimate the entries of P. 

Our objective is to infer the values of Pi, ... , Pk in the 
setting where both n and N are of similar order of magnitude 
or, stated in signal processing terms, when the number of 
observations of the process y t is not large compared to the 
system dimension N. As before, for readability, we denote 
Y = [yi,...,y n ] and recall that -YY H is the sample 
covariance matrix of these data. 

Let us first consider the classical approach which assumes 
that n N. We then have from a direct application of the 
law of large numbers that 
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an (asymptotically) n-consistent estimate P%° for Pk reads 

1 



N k ^ 



A, 



(3) 



as n — > oo. Since the convergence is in spectral norm and 
since the eigenvalues of UPU H are the same as those of P, 



where Ai < . . . < Xn are the ordered eigenvalues of ^YY H 
gathered in successive clusters Nfc = {N — J2t=k ^ + 
1, . . . , N - E^fe+i N i}- Indeed, the eigenvalues of ^YY H 
indexed by ZNT^ can be directly mapped to P& (their limiting 
value in the large n regime). 

However, for N and n of comparable sizes, the convergence 
(|2) no longer holds. In this scenario, the spectrum of ^YY H 
converges weakly to the union of a maximum of K com- 
pact clusters of eigenvalues concentrated somewhat around 
Pi,...,Pk, but whose centers of mass are not located at 
Pi, ... , Pk- This is depicted in Figure [3] where it is clearly 
seen that eigenvalues gather somewhat around the empirical 
values of Pi, P 2 , P3 in clusters. Therefore, as N, n — > 00, the 
estimator P^° is no longer (asymptotically) consistent. Note 
also that, depending on the values of Pi , ... , Pk (and in fact 
also on the ratio N/n), the number of clusters varies. From 
the second plot of Figure [3] we already anticipate the well- 
known problem of order selection, i.e. determining the number 
K of distinct input sources from an observation Y. For these 
reasons, more advanced analyses of the spectrum of -YY H 
must be carried out, from which (N, n)-consistent alternatives 
to ([3| can be derived, i.e. estimators which are consistent as 
both N, n grow large. This introduction of a methodological 
approach to (N, n)-consistent estimators, called G-estimators, 
is one of the objectives of this article. 

III. Large dimensional random matrix theory 

In this section, we introduce the basic notions of large di- 
mensional random matrix theory and describe two elementary 
tools: the resolvent and the Stieltjes transform. These tools 
are necessary to understand the stark difference between the 
large random matrix techniques and the conventional statistical 
methods. 

Random matrix theory studies the properties of matrices 
whose entries follow a joint probability distribution. The 
two main axes deal with the exact statistical characteriza- 
tion of small-size matrices and the asymptotic behaviour of 
large dimensional matrices. For the analysis of small-size 
matrices, one usually seeks closed form expressions of the 
exact statistics. However, these expressions become quickly 
intractable for practical analysis as one departs from white 
Gaussian matrices ]TT|-|fT3l. The mathematical methods used 
are usually very standard, mostly based on standard tools from 
probability theory. In the context of large dimensional random 
matrices, the tools are very different: asymptotic statistics and 
probability are obviously fundamental, but also linear algebra, 
as well as real and complex analysis, and combinatorics. 
The focus of these techniques are either based on invariance 
properties of the underlying matrices (free probability theory 
JT4| , fT3) , combinatorics fl6| , fT7) , Gaussian integration by 
part methods |18|, [19]) or on independence properties of 
their entries ]6[, |20|, |21[. In the following, we will mostly 
concentrate on tools for large dimensional random matrices 
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Fig. 3. Histogram of the eigenvalues of ^YY H for N = 300, n = 3000, 
with P diagonal composed of three evenly weighted masses in (i) 1, 3 and 
7 at the top, (ii) 1, 3, and 4 on the bottom. 



with independent entries, although some results we present 
originate from other techniques. The main tool of interest in 
the spectral study of these matrices is the Stieltjes transform, 
which we introduce hereafter. 

A. The Stieltjes transform 

Similar to the celebrated Fourier transform in classical 
probability theory and signal processing which allows one to 
perform simpler analysis in the Fourier (or frequency) domain 
than in the initial (or time) domain, the spectral analysis of 
large dimensional random matrice^] is often carried out with 
the help of the Stieltjes transform. 

Definition 1 (Stieltjes transform): Let F be a real probabil- 
ity distribution function and z £ C taken outside the support 
§ of F. Then the (Cauchy-)Stieltjes transform mp(z) of F at 
point z is defined as 

m F {z) 4 / -!-dF(t). 
Js t- z 

2 What one refers here to as spectrum is the eigenvalue distribution of the 
underlying matrix. 



Note importantly that, for X 6 C Hermitian with 
eigenvalues Ai, . . . , Ajv, and eigenvalue distribution F x , the 
Stieltjes transform tox of X is, for z G C \ {Ai, . . . , A^v}, 
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The Stieltjes transform, in the same way as the Fourier 
transform, has an inverse formula, given by 

F(b) - F(a) = lim / 3 \m F (x + iy)} dx. (4) 
vloJa 

This formula is however rarely used in practice. What this 
says for random matrices is that, if the Stieltjes transform of 
a matrix X is known, then one can retrieve its eigenvalue 
distribution, which is often more difficult to obtain in the 
spectral domain than in the Stieltjes transform domain. 

Since we will constantly deal with matrices of the form 
(X— zljv) -1 , called the resolvent matrix of X, the analysis of 
random matrices is fundamentally based on the exploitation of 
classical matrix inversion formulas. In order to better capture 
the essence of the Stieltjes transform approach, we hereafter 
outline the core arguments of the proof of the Marcenko-Pastur 
law, as performed in the original article [6] back in 1967. 

B. The Marcenko-Pastur law 

We recall that we wish to determine the expression of the 
limit of the eigenvalue distribution of YY H , given in Figure [5] 
as N, n — !> oo, N/n -> c, where Y = [yi , . . . , y n ] e £ N * n 
has i.i.d. entries of zero mean and variance l/n. In the Stieltjes 
transform domain, we therefore need to evaluate ttiyy" i z ) = 
±tr(YY"-zI N )-\ 

The idea is to concentrate on the value of each individual 



diagonal entry of the resolvent (YY H — zl^)^ 1 . Because 
of its obvious symmetrical structure, it suffices to study 
the asymptotic behaviour of the first diagonal entry. Writing 
(YY H — zljv) -1 as a block matrix with upper-left corner 
composed of the unique (1,1) entry and applying the classical 
Schur complement ] |22) , it is easy to see that 

1 



[(YY H - zI N )-i] 



ii 



Z yH(Y H Y 



(5) 



where we have defined Y and yi such that Y H = [yi Y H ]. 
Due to the independence between yi and Y and the fact 
that yi has i.i.d. entries, the quantity y5 H (Y H Y — zl„) _1 yi 
approaches - tr(Y H Y— zl„) _1 as the system sizes N, n grow 
large. Indeed, this is clearly true in the expectation over yi 
(with equality) and, conditioned on Y, the convergence some- 
what recalls a law of large numbers. The interesting part is to 
notice that the latter trace expression is tightly connected to 
jj tr(YY H — zljy )^ 1 , as the only major difference between the 
two lies in a single column change (a rank-one perturbation) 
which asymptotically will not alter the limiting normalized 
trace. A complete exposition of this derivation can be found 
in many random matrix books, e.g. J23|-[25|. 

Since this remark is valid for all diagonal entries of the 
resolvent (YY H — zljv) -1 , it holds also true for their average 
z!n)~ x , i-e. for the Stieltjes transform to Y y h i z ) 



itr(YY H 
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of YY H . Recalling |5]), the Stieltjes transform can therefore be 
asymptotically well approximated as a function of itself. This 
naturally leads to yy h ( 2 ) to be approximately the solution of 
a fixed-point equation (or implicit equation). In many random 
matrix models, this is as far as the Stieltjes transform would 
go, and we would conclude that the limiting spectrum of the 
matrix under study is defined through its Stieltjes transform, 
whose expression is only known through a fixed-point equation 
(with usually a unique solutiorj^J. 

In the case of the Marcenko-Pastur law, this fixed-point 
equation reads precisely 

1 

TO YY H (Z) ~ jr^r (6) 

1 — c — z — zcm Y Y H \ z ) 

which turns out to be equivalent to a second order polynomial 
in m YY H (z) which can be solved explicitly. Using the inverse 
Stieltjes transform formula Q, we then obtain the limiting 
spectrum of YY H with density f{x) given by 

f(x) = (1 - c-ySix) + J—y/( x - a )(b-a 



2ircx 

with a = (1 — \/c) 2 , b = (1 + \/c) 2 , and we recall that c is 
the limiting ratio lim NJ n, where the square-root appearing in 
the expression originates from the roots of the second order 
polynomial defining m YY n(z). 

It is interesting to see that this expression is only defined for 
a < x < b and possibly x = 0, which defines the support of 
the distribution. In particular, we confirm, as already discussed 
that, as c — > 0, [a, 6] tends to a singleton in 1. The convergence 
to {1} is however quite slow as it is of order %fc for c small 
(since (1 + \/c) 2 ~ 1 + in this regime). This suggests 
that n must be much larger than N for the classical large-n 
approximation to be acceptable. Typically, in array processing 
with 100 antennas, no less than 10 000 signal observations 
must be available for subspace methods relying on the large-n 
approximation to be valid with an error of order 1/10 on the 
assumed eigenvalues of YY H . The resulting error may already 
lead to quite large fluctuations of the classical estimators. 

This observation strongly suggests that the classical tech- 
niques which assume n S> N must be revisited, as will be 
detailed in Section |III-D| In the next section, we go a step 
further and introduce a generalization of the Marcenko-Pastur 
law for more structured random matrices. 

C. Spectrum of large matrices 

As should have become clear from the discussion above, 
a central quantity of interest in many signal processing ap- 
plications is the sample covariance matrix of n realizations 
Y = [yi, . . . , y„] of a process y t = Tx t with x t having i.i.d. 
zero mean and unit variance entries. The realizations may be 
independent, in which case Y = TX, X = [xi, . . . , x„] with 
i.i.d. entries, or have a (linear) time dependence structure, in 
which case y t is often modelled as an autoregressive moving- 
average (ARMA) process; if so, we can write Y = TZR 
with Z a random matrix with i.i.d. zero mean entries and R 

3 For z £ K — , the fixed-point map is usually a standard interference 
function, as described in [26 1, allowing for simple algorithmic methods to 
solve the fixed-point in practice. 



a deterministic time correlation matrix (generally Toeplitz). 
The limiting eigenvalue distribution of many of such random 
matrix structures have been studied in the past ten years and 
have led to many fundamental results which only recently have 
found their way to signal processing applications. 

In the following, we treat the simplest case of a sample 
covariance matrix based on n independent observations of the 
process y t described above, when x t has i.i.d. entries with 
zero mean. The outline of the techniques below is in general 
the same when it comes to more complicated models. 

We start with the important generalization of the Marcenko- 
Pastur law to matrices with left-sided correlation, whose 



general proof is due to Bai and Silverstein |20|. 



Theorem 1: Consider the matrix 



where X = [xi,...,x n ] g ^Nxn nas jj^ en t r i es w ith 



(') zero mean and variance 1/n, and T G 



•<NxN 



is nonnegative 

Hermitian whose eigenvalue distribution F T converges weakly 
to F T as N — » oo. Then, the eigenvalue distribution F T of 
T converges weakly and almost surely to the distribution F 
with Stieltjes transform mf(z) = cmp^z) + (c — 1)-, where 
mp(z) is a solution to 



rriF_{z) 



1 + tmp(z) 



dF T (t) 



(8) 



for all z S 



{ze 



[z] > 0}. 



Note here that, contrary to the case treated previously where 
T = Ijv, the limiting eigenvalue distribution of T does not 
take an explicit form, but is only defined implicitly through its 
Stieltjes transform which satisfies a fixed-point equation. This 
result is nonetheless usually sufficient to derive explicit detec- 
tion tests and estimators for our application needs. Although 
not mentioned for readability in the statement of the theorem, 
it is usually true that the fixed-point equation has a unique 
solution on some restricted set, and that classical fixed-point 
algorithms do converge surely to the proper solution; this is 
particularly convenient for numerical evaluations. 

An important remark for the subsequent discussion is that 
the integral in ([8]) can be easily related to the Stieltjes 
transform of nip of F T to give the equivalent representation 



mf(z) 
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nriF{z) 



(9) 



Although the original formula of [20| is given by (8), Equation 
(|9| is more interesting in several aspects. The important 
observation in |9| is that the Stieltjes transform mx can be 
related to the limiting Stieltjes transform mp of T through this 
simple equation. This is the fundamental anchor for inference 
techniques, where the observable data (T) get connected to 
the hidden parameters (information on T) to be estimated. 

Indeed, in the sample covariance matrix model, eigen- 
inference methods used in subspace estimators consist in 
retrieving information about the population covariance matrix 
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T from the sample covariance matrix T. Therefore, if one can 
relate the information to be estimated in T to tot (which is 
asymptotically %), then |9} provides the fundamental link 
between T and T through their respective limiting Stieltjes 
transform, from which estimators can be obtained. This pro- 
cedure is detailed in the next section. 

Note that Theorem [T] provides the value of mp(z) for all 
z E C + . Therefore, if one desires to numerically evaluate the 
limiting eigenvalue distribution of T, it suffices to evaluate 
mp(z) for z = x + ie, with e > small and for all x > and 
then use the inverse Stieltjes transform formula Q to describe 
the limiting spectrum density / by 

fix) ~ —5 [mpfx + te)] . 

7T 

We used this technique to produce Figure [3] in which T = 
UPU H was assumed composed of three evenly weighted 
masses in {1,3,7} (top) or {1,3,4} (bottom). 

As mentioned above, it is possible to derive the limiting 
eigenvalue distribution of a wide range of random matrix 
models or, if a limit does not exist for the model (this may 
arise for instance if the eigenvalues of T are not assumed 
to converge in law), to derive deterministic equivalents for 
the large dimensional matrices. For the latter, the eigenvalue 
distribution F T of T can be approximated for each N by a 
deterministic distribution Fn such that F T — Fjv => almost 
surely. These distribution functions Fjv usually have a very 
practical form for analysis; see e.g. [27|-|29| for examples 
of random matrix models that do not admit limits but only 
deterministic equivalents. 

The knowledge of the limiting eigenvalue distribution of 
such matrices is not of particular interest to signal processing, 
but it is a required prior step before performing statistical 
inference and detection methods. In wireless communication 
problems however, such as that of evaluating the capacity 
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^logdet fljv + ^HH" 

of a multi-antenna Gaussian channel H with noise variance 
a 2 , these tools alone are fundamental, see e.g. p8)-|32). 
In particular, we recognize in the above formula that the 
Stieltjes transform tr(HH H - zIn)^ 1 of HH H , and not its 
eigenvalue distribution, is the quantity of central interest. This 
is probably the main explanation why wireless communication 
applications of random matrix results have appeared as early 
as in 1999 (33) , (34), while applications for signal processing 
methods emerged only very recently. 

In the following, we pursue our study of the spectral prop- 
erties of large dimensional random matrices to applications 
in statistical inference and introduce the eigen-inference (or 
G-estimation) method. 

D. G-estimation 

As introduced in the previous section, one of the major 
objectives of random matrix theory for signal processing is to 
improve the statistical tests and inference techniques derived 



under the assumption of infinitely many observations for 
scenarios where the system population size N and the number 
of observations n are of the same order of magnitude. For 
this, we will rely on the connections between the Stieltjes 
transforms of the population and sample covariance matrices. 

The G-estimation method, originally due to Girko (8), 
intends to provide such (N, n)-consistent estimators. The 
general technique consists in a three-step approach along the 
following lines. First, the parameter to be estimated, call it 
9, is written under the form of a functional of the Stieltjes 
transform of the deterministic matrices of the model, say here 
9 = /(tot) for a population matrix T with Stieltjes transform 
Tot- Then, tot is connected to the Stieltjes transform ro T 
of an observed matrix T through a formula similar to that 
described earlier in (|9j, e.g. tot = g{rnp) for a certain 
function g. Finally, connecting the pieces together, we have 
a link between 9 and the observations T, this link being only 
exact asymptotically. For finite N, n dimensions, this natu- 
rally produces an (asymptotically) (N, n)-consistent estimator 

e = 5(to t ). 

While the fundamental tool for connecting population and 
observation spaces is the Stieltjes transform, the second im- 
portant tool, that will help connecting the parameter 9 to the 
Stieltjes transform tot, is the Cauchy integral formula and 
complex analysis in general. 

For better understanding, we hereafter elaborate on a 
concrete example to describe the eigen-inference framework 
within the context of the estimation of the source powers 
discussed in Section Hl-B I 

We recall that one observes a matrix of the type Y = 
UPsX, where U e C NxN is unitary, X e C Nxn is filled 
with i.i.d. entries with zero mean and unit variance and P is 
diagonal with Ni entries equal to Pi, and Nk entries 
equal to Pk, all values of Pj being distinct. 

While the objective in the previous section was to charac- 
terize the asymptotic eigenvalue distribution of iYY H when 
P is known, the target of this section is instead to infer 
Pi, ... , Pk from the observation Y. For this, we make the 
following fundamental observation. From Cauchy's complex 
integration formula |7), 

1 / z 

Pk = -_ — <f> - dz 



2tti Jq Pk — z 

for a complex positively oriented contour Gk circling once 
around P&. If Gk does not enclose any of the other Pj, i ^ k, 
then it is also true, again from Cauchy's integration formula, 
that 



Pk = 



1 N 
1 N 



hy^^—dz 

V ^ P t ~z 

i=0 1 

I f 

- (h zmp(z)dz. 
k Je k 



(10) 



Hence, we are able to write Pk as a function of the Stieltjes 
transform of the non-observable matrix P. The next step is 
to link top to the Stieltjes transform of iYY H . For this, 
we use (|9]l with T = UPU H (which has the same eigenvalue 
spectrum as P) and F the limiting spectrum of i Y H Y, which 
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after the variable change z = mpiu) leads to the elementary 
expression 

11/ m' F {u) 
P k = -—<p u - du 
c 2m J e ^ mpyu) 

for some contour G' k which happens to circle exactly around 
the cluster of eigenvalues of T associated to P k only, and 
with m'p the complex derivative of mp(z) with respect to 
z. It is therefore fundamental that there exists a contour that 
circles around the cluster associated to P k only. It may happen, 
depending on the ratio c and the values of the P k , that 
the limiting spectrum of T generates less than K clusters, 
some clusters being associated with multiple power values, as 
depicted in the second graph of Figure [3] In this scenario, 
the above result does not hold and the technique collapses. 
In signal processing applications, this effect is linked to the 
problem of source separation. From now on, we therefore 
assume that the cluster of eigenvalues associated to P k is 
perfectly isolated. Necessary and sufficient conditions for this 



to hold are clearly established |57|. 

Under the cluster separability assumption, an estimator P k 
for Pk is now straightforward to obtain. Indeed, replacing tuf 
by its empirical estimate mi Y H Y , Pk is approximately equal 
to a complex integral whose integrand is constituted of rational 
functions. A final step of residue calculus [7| completes the 
calculus and we obtain explicitly the estimator 



Pk 



(Xm Mm) 



(ID 



where is defined as in |3j, Ai < ... < \n are the 
eigenvalues of ^YY H , and fii < ■ ■ . < /ijv are the ordered 

eigenvalues of diag(A) — jVAvA , with A = (Ai, . . . , \n) T - 
This estimator therefore improves the n-consistent estimator 
|3]l to an (TV, n)-consistent estimator which surprisingly turns 
out to have much better performance statistics for all couples 
(N,n). Performance figures will be provided in Section IV 
for signal processing scenarios of more practical interest. 

Multiple estimators can be derived similarly for different 
types of models and problems, a large number of which have 
been investigated by Girko and named the G-estimators. A list 
of more than fifty estimators for various applications can be 
found in [35). In Section [TV] some examples in the context of 
path loss estimation in wireless communications and angle of 
arrival estimation for array processing will be discussed. 

We now turn to a finer analysis of random matrices which 
is no longer concerned with the weak limit of eigenvalue 
distributions but rather with the limiting fluctuations of in- 
dividual eigenvalues (in particular the extreme eigenvalues). 
This problem is at the core of some standard statistical tests 
for signal detection, such as the generalized likelihood ratio 
test (GLRT). 

E. Extreme eigenvalues 

It is important to note at this point that Theorem [T] and all 
theorems deriving from the Stieltjes transform method only 
provide the weak convergence of the empirical eigenvalue 
distribution to a limit F but do not say anything about the 



behaviour of the individual eigenvalues. In particular, the 
fact that the support of F is compact does not mean that 
the empirical eigenvalues will asymptotically all lie in this 
compact support. Indeed, consider for instance the distribution 
Fn(x) = ^j^l x <i + jj^-x<2- Then the largest value taken 
by a random variable distributed as Fn equals 2 for all N 
although it is clear that Fjf converges weakly to F = l x <i 
(and then the support of F is the singleton {1}). In the context 
of Theorem[T] this reasoning indicates that, although the weak 
limit of F T is compactly supported, some or even many 
eigenvalues of F T may still be found outside the support. 

As a matter of fact, it is proved in [36] that, if the entries 
of X in Theorem [T] have infinite fourth order moment, then 
the largest eigenvalue of XX H tends to infinity. For non- 
degenerate scenarios though, i.e. when the fourth moment of 
the entries of X is finite, we have the following much expected 
result [! 371, (381. 



Theorem 2 (No eigenvalue outside the support): Let T be 
defined as in Theorem [T] with X having entries with finite 
fourth order moment and with F T =>■ F T , such that T has no 
eigenvalue outside the support of F T . We recall that F T F 
for F defined in Theorem [I] Take now [a, b] C K U {±00} 
strictly away from the support of F. Then, almost surely, there 
is no eigenvalue of T found in [a, b] for all large N. 

This result says in particular that, for T = Ijy, and for all 
large N, there is no eigenvalue of T found away from the 
support of the Marcenko-Pastur law. This further generalizes 
to the scenario where T has a few distinct eigenvalues, each 
with large multiplicities, in which case F is formed of multiple 
compact clusters, as depicted in Figure [3] In this context, it is 
proved in [39| that the number of eigenvalues asymptotically 
found in each cluster matches exactly the multiplicity of 
the corresponding mass in T (assuming cluster separability). 
This is often referred to as the property of exact spectrum 
separation. Note that this point fully justifies the last step in 
the derivation of the estimator ( fTTj i where we somewhat hid 
the fact that the residue calculus naturally imposes to know 
where the eigenvalues of T are precisely located. As we will 
see in Section |TV} these results on the asymptotic location of 
the eigenvalues allow us to derive new hypothesis tests for the 
detection of signals embedded in white noise. In particular, we 
will present tests on the received sample covariance matrix T 
that confront a pure white noise against a signal-plus-noise 
hypotheses, i.e. T = Ijy against T ^ Ijy. 

Based on the earlier reasoning on the weak limit inter- 
pretations, we insist that, while Theorem [T] does state that 
F T =>■ l x <i implies that F is the Marcenko-Pastur law, it 
does not state that F T =>■ l x <i implies that no eigenvalue is 
asymptotically found away from the support of the Marcenko- 
Pastur law. Indeed, if T = diag(l, . . . , 1, a), with a 7^ 1, 
the theorem cannot be applied. Conversely, for T described 
above, it is not clear whether an eigenvalue of T will be 
seen outside the limiting support. This interrogation is of 
fundamental importance for the performance evaluation of 
asymptotic detection tests and has triggered a recent interest 
for this particular model for T, called the spike model, which 
will be discussed next. 
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F. Spike models 

It is of particular interest for signal processing applications 
to model low rank signal spaces embedded in white noise. 
This is the case in array processing where a single signal 
source with unique propagation path is received during several 
consecutive time instants, hence producing a rank-one signal 
matrix to be added to the ambient full rank white noise. The 
objective of this particular model is to derive simple detection 
and identification procedures that do not fall in the rather 
involved scheme of Section IIII-DI 

Under the generic denomination of "spike models" we 
understand all random matrix models showing small rank 
perturbations of some classical random matrix models. For 



instance, for X G 



iNxn 



with i.i.d. entries with zero mean 



and variance 1/n, the matrices T = (X + E)(X 



E 



H 



E <£ C ivx ™ such that EE H is of small rank K, or T = 
(Ijy + E)2XX h (Ia, + E)s, E e C NxN of small rank K, 
fall into the generic scheme of spike models. For all of these 
models, Theorem[Tjand similar results, see e.g. pO) , claim that 
F T converges weakly to the Marcenko-Pastur law, which is 
the major motivation of these models as they are rather simple 
to analyze. The interest here though is to study the behaviour 
of the extreme eigenvalues of T. 

To carry on with the same models as before, we concentrate 
here on T modeled as in Theorem [l] with T taken to 
be a small rank perturbation of the identity matrix. Similar 



properties can be found for other models e.g. in |41|, [42|. 
The first result deals with first order limits of eigenvalues and 
eigenvector projections of the spike model [41 1, |43|, [44|. 

Theorem 3: Let T be defined as in Theorem[l] X have i.i.d. 
entries with zero mean and variance 1/n, and T = Iat+E with 
E = Y^i=i UJ i n i xlV i i ts spectral decomposition (i.e. u^u 3 = 
Sj), where K is fixed and uji > ■ ■ ■ > ljk > 1- Denote 
Ai > . . . > Xn the ordered eigenvalues of T and € C N 
the eigenvector associated with the eigenvalue A^. Then, for 
1 < i < K, as N, n — > oo with Nj n — > c, 



A,- 



(1 + Tc) 2 

Pi = 1 + LOi 



+ c 



and 




, if LJi < V 
, if (Ji > V 

if (Ji < yfc 
if W t > y/c. 



The main observation of this result is that, if u>i < yfc, 
then the largest eigenvalue of T asymptotically converges to 
the right edge of the Marcenko-Pastur law and is therefore 
hidden in the main bulk of the eigenvalues, while if uji > ^fc, 
the largest eigenvalue of T is found outside the main cluster. 
This has fundamental consequences. When T is the sam- 
ple covariance matrix of signal-plus-noise data with signal 
strength wj., then, if the signal is strong enough or conversely 
if N/n is small enough, it is possible to detect this signal 
based on the presence of an eigenvalue exterior to the main 
eigenvalue cluster of T. Otherwise, neither the eigenvalues 
nor the eigenvectors of T provide any information on the 
presence of a signal, at least asymptotically. As a conclusion, 



Marcenko-Pastur law, c — 5/4 
Empirical eigenvalues 




Eigenvalues 

1 Ul I 

Fig. 4. Eigenvalues of T = T 2 XX" T 2 , where T is a diagonal of ones 
but for the first four entries set to {3, 3, 2, 2}, N = 500, n = 400. The 
theoretical limiting eigenvalue of T is emphasized. 



in the latter scenario, there is no way to decide if a signal 
is indeed present or not. This provides fundamental limits of 
signal detection tests. This subject is further discussed in the 



application Section IV 



In Figure |4j we depict the eigenvalues of a spike model as in 
Theorem [3] with four population eigenvalues greater than one. 
Two of them exceed the detectability threshold uii > yfc, while 
the other two do not. As expected, only two eigenvalues are 
visible outside the bulk, with values close to their theoretical 
limit. 

Although this was not exactly the approach followed ini- 
tially in p3| , the same Stieltjes transform and complex integra- 
tion framework can be used to prove this result. We hereafter 
give a short description of this proof in the simpler case where 
E = wuu H and w > \fc. 

1) Extreme eigenvalues: By definition of an eigenvalue, 
dct(T — zljv) = for z an eigenvalue of T. Some algebraic 
manipulation based on determinant product formulas and 
Woodbury's identity p2[ lead to 



det(f - zI N ) = f N (z) det{I N + E) det(XX H - zl N ) 

with f N (z) = 1 + z— u H (XX H - zIn)- 1 !!. 

1 + oj v 1 



An eigenvalue of T not inside the main cluster cannot cancel 
the right-hand determinant in the first line and must therefore 
cancel Jn{z)- Standard random matrix lemmas then show that, 
since u H (XX H — z\^)^ 1 vl is asymptotically close to m,p{z), 
the Stieltjes transform of the Marcenko-Pastur law, 



f N (z) ^ f(z) ±l + z 



1 



-mf(z). 



Substituting rriF(z) by its exact formulation (that is, the 
Stieltjes transform of the Marcenko-Pastur law), we then 
obtain that f(z) = is equivalent to z = 1 + lu + c^f- if 
oj > y/c and has no solution otherwise, which is the expected 
result. 
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2) Eigenvector projections: For the result on eigenvector 
projections, we use again the Cauchy integration scheme 
which states that, for any deterministic vectors a, b g C N 
and for all large N, 

a H UiuS H b = /a H (f - zl^^b (12) 

27TZ J e 

where C is a positively oriented contour circling around to 
and excluding 1, and where we again recognize a quadratic 
form involving the resolvent of T. This is an immediate 
consequence of the fact that (T— zljv) -1 has a pole in Ai with 
residue — Uiu^ . We then use the same matrix manipulations as 
in the previous section (by isolating the term XX H ) to finally 
obtain the result of Theorem [3] after residue calculus. 

These first order results are important since they suggest 
detection and identification tests in signal processing scenarios 
with extremely large arrays. However, for realistic small 
system sizes, these results are not sufficient to provide efficient 
statistical tests. To overcome this limitation, we need to go 
beyond the first order limits and characterize the fluctuations 
of the extreme eigenvalues and eigenvector projections. Recent 
works have provided such statistics for the results of Theo- 
rem [3] as well as for the fluctuations of the estimator ( fTT| , 
which we introduce presently. 

G. Fluctuations 

We first mention that the fluctuations of functionals of F T — 
F in Theorem [T] have been derived in (45). More precisely, 
for any well-behaved function / (at least holomorphic on the 
support of F) and under some mild technical assumptions, 

N J /(f) (dF*(t) - dF(i)) X(0,a 2 ) 

with a 2 known. This in particular applies to /(f) = it — z)^ 1 
for z G C \ M + , which gives the asymptotic fluctuations of 
the Stieltjes transform of T. From there, classical asymptotic 
statistics tools such as the delta-method [1] allow one to 
transfer the Gaussian fluctuations of the Stieltjes transform 
m,j, to the fluctuations of any function of it, e.g. estimators 
based on m^,. The following result on the fluctuations of the 
estimator (jTTJ therefore comes with no surprise [46 1. 

Theorem 4: Consider the estimator Pk in ( flT) . Assuming 
the entries of X have finite fourth order moment (to ensure 
Theorem [2]) and that Pk generates an isolated cluster, 

N(P k - P k ) => N (0, a 2 ) 

where a 2 is evaluated precisely in |46| as a complex integral 
form involving derivatives of the limiting Stieltjes transform 
of T. 

In the case of the spike model, it is of interest to derive the 
fluctuations of the extreme eigenvalues and eigenvector pro- 
jections, whose limits were given in Theorem [3] Surprisingly, 
the fluctuations of these variables are not always Gaussian, 
(44), (47), (48). 



Theorem 5: Let T be defined as in Theorem [3j Then, for 

1 < k < K, if Ldk < \fc, 

where is known as the complex Tracy-Widom distribution, 
described in (49) as the solution of a Painleve differential 
equation. If, on the other hand, cjj. > %fc, then 

V A fc — pk j 

where the entries of the 2x2 matrix can be expressed in 
a simple closed form as a function of u!k only. Moreover, for 
k 7^ k' such that ujk , u' k > \fc and distinct, the centered-scaled 
eigenvalues and eigenvector projections are asymptotically 
independent. 

The second result of Theorem [5] can be obtained using the 
same Stieltjes transform and complex integration framework 
as described in the sketch of proof of Theorem [3] see [44] for 
details. Again in that case, these are mainly the fluctuations 
of the Stieltjes transform of XX H and an application of the 
delta-method which lead to the result. The first result of 
Theorem [5] is proved with very different methods which we do 
not introduce here (since the extreme eigenvalues are inside the 
base support, complex contour integration approaches cannot 
be performed). These methods involve techniques such as 
orthogonal polynomials and Fredholm determinants which rely 
on the study of the finite dimensional distribution of the 
eigenvalues of XX H before ultimately providing asymptotic 
results, see (24|, (50| for details. 

The Tracy-Widom distribution is depicted in Figure [3] It 
is interesting to see that the Tracy-Widom law is centered on 
a negative value and that the probability for positive values 
is low. This suggests that, if the largest eigenvalue of T is 
greater than (1 + \fc) 2 , then it is very likely that T is not 
the identity matrix. A similar remark can be made for the 
smallest eigenvalue of T which has a mirrored Tracy-Widom 
fluctuation JBT) , when c < 1. 

Theorem [5] allows for asymptotically accurate test statis- 
tics for the decision on signal-plus-noise against pure noise 
models, thanks to the result on the extreme eigenvalues. 
Moreover, the results on the eigenvector projections provide 
even more information to the observer. In the specific context 
of failure diagnosis [ |44| , introduced in Section |IV| this can 
be used for the identification of the (asymptotically) most 
likely assumption from a set of M failure models of the type 
(I N + E^XX^Ijv + E fc )», for k e {1, . . . , M}. 

H. Practical limitations 

For all techniques derived above, it is important to keep 
in mind that the results are only valid under the assumption 
of large matrix dimensions. As it turns out from convergence 
speed considerations, estimators based on weak convergence 
properties of the eigenvalue distribution are accurate even for 
very small system sizes (N — 8 is often sufficient, if not less), 
while second order statistics of functionals of the eigenvalue 
distribution roughly require N to be at least of order 64. When 
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Tracy-Widom law T 2 
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Centered-scaled largest eigenvalue of XX H 

Fig. 5. Distribution of TVS c~ 2 (1 + v^) - 3 [Ai - (1 + ^c) 2 ] against the 
Tracy-Widom law for TV = 500, n = 1500, c = 1/3, for the covariance 
matrix model XX H , X 6 QNxn w j m Gaussian i.i.d. entries. Empirical 
distribution from 10, 000 Monte-Carlo simulations. 



it comes to asymptotic properties of the extreme eigenvalues, 
due to the loss of the collective averaging effect of all the 
eigenvalues, the convergence speed is much slower so that 
N = 16 is often a minimum for first order convergence, 
while N of order 128 is required for asymptotic second order 
statistics to become accurate. Note also that these empirical 
values assume "non-degenerated" conditions; for instance, for 
N = 128, the second order statistics of the largest eigenvalue 
in a spike model with population eigenvalue oj = y/c + e 
for a small e > are often far from Gaussian with zero 
mean (since they should be simultaneously close to follow the 
very negatively-centered Tracy-Widom distribution) so that, 
depending on how small e is chosen, much larger N are 
needed for an accurate Gaussian approximation to arise. This 
suggests that all the above methods have to be manipulated 
with extreme care depending on the application at hand. 

The results introduced so far are only a small subset of the 
important contributions resulting from more than ten years of 
applied random matrix theory. A large exposition of sketches 
of proofs, main strategies to address applied random matrix 
problems, as well as a large amount of applications are 
analyzed in the book (25) in the joint context of wireless 
communications and signal processing. An extension of some 
notions introduced in the present tutorial can also be found in 
[52 1 . Deeper random matrix consideration about the Stieltjes 
transform approach can be found in (23} and references 
therein, while discussions on extreme eigenvalues and the tools 
needed to derive the Tracy-Widom fluctuations, based on the 
theory of orthogonal polynomials and Fredholm determinants, 
can be found in e.g. p4| , | [50| , (53) and references therein. 
In the next section, we discuss several application examples, 
already partly introduced above, which apply Theorems [TJ3] in 
various contexts of hypothesis testing and subspace statistical 
inference. 



IV. Case studies 

In this section, several classical detection and estimation 
schemes are revisited for the regime N ~ n studied in 
Section HTH 

A. Multi-source power inference in i.i.d. channels 

Our first case study originates from the field of wireless 
communication, closely related to the previous example of 
eigenvalue inference. We consider a cognitive sensing scenario 
p4| in which an iV-antenna sensing device (a mobile terminal 
for instance) collects data from K multi-antenna transmitters. 
Transmitter k is equipped with M k antennas and sends the 
signals Xfc(t) € C Mk to the receiver at time t through the 
channel 6 c-Wxa/^ containing i.i.d. Gaussian entries with 
zero mean and variance 1 /iV"j^] We assume Mi + . . . + Mk — 
M < N and that H = [H 1; . . . , Hk] remains static during 
at least n symbol periods. The receiver also captures white 
Gaussian noise crw(t) with zero mean and variance a 2 . The 
receive signal y(t) e C N at time t is then modelled as 

K 

y(i) = ^PiH-Mt) + ffw(i). 



i=i 



Gathering n realizations of y(t) € 
[y(l), • ■ ■ ,y(n)], Y takes the form 



in the matrix Y 



Y= HPl trl 



N. 



where X = [x(l), . . . , x(n)], x(t) = [ Xl (i) T , . . . , x K {t)] T , 
P = diag(PiI Ml , ■ • ■ , Pk1m k ), and W = [w(l), . . . , w(n)]. 
We wish to estimate the power values Pi, ... , Pk from the 
observation Y. We see that the sample covariance matrix 
approach developed in Section |III-D is no longer valid as the 
population matrix T = HPH H + ct 2 Ijv is now random. 

Following the general methodology developed in Sec- 
tion [DTD] assuming the asymptotic cluster separation property 
for a certain Pk, we first use the connection between Pk and 
mp(z), the limiting Stieltjes transform of P, given by ( fTO) . 
The innovation from the study in Section III-D is that the 
link between rap and mp, the limiting Stieltjes transform of 
_i_Y H Y, is more involved and requires some more random 
matrix arguments than Theorem [T] alone. This link, along with 
a generalization of Theorem [2] to account for the randomness 
in T, induces the following (N, n)-consistent estimator Pk of 
Pk (44) 

Pk = M k (n-N) 

where is the set of indexes of the eigenvalues of ^YY H 
associated to Pk, r\i < ... < 77^ are the eigenvalues of 
diag(A) — jj^/X^/X and fj,i < . . . < /ijv are the eigenvalues 

of diag(A) — ^-v/A-y/A , where Ai < . . . < A^v are the ordered 
eigenvalues of -YY H . 

In Figure [6] the performance comparison between the clas- 
sical large n approach and the Stieltjes transform approach for 

4 The normalization of the channel entries by 1/jV along with ||xfc(t)|| 
bounded allows one to keep the transmit signal power bounded as N grows. 
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Fig. 6. Normalized mean square error of largest estimated power P3 for 
Pi = 1/16, P 2 = 1/4 P 3 = 1, Mi = M 2 = M 3 = 4, At = 12, 
N = 24, n = 128. Comparison between conventional and Stieltjes transform 
approaches. 



power estimation is depicted, based on 10, 000 Monte Carlo 
simulations. Similar to the estimator Q, the classical approach 
of Pfc, denoted Pf* 3 , consists in assuming n ^> N and N 3> 
Mi for each i (recall that Mj is the multiplicity of Pj). A clear 
performance gain is observed for the whole range of signal- 
to-noise ratios (SNR), defined as ct~ 2 , although a saturation 
level is observed which translates the approximation error due 
to the asymptotic analysis. For low SNR, an avalanche effect 
is observed, which is caused by the inaccuracy of the Stieltjes 
transform approach when clusters tend to merge (here the 
cluster associated to the value a 2 grows large as a 2 increases 
and covers the clusters associated to Pi, then P%, and finally 
P 3 ). Nonetheless, this apparently strong cluster separability 
limitation generates an avalanche below the SNR level of the 
well-known avalanche effect produced by the classical large 
n estimator. The random matrix method therefore yields a 
twofold performance gain compared to the classical method, 
as it is both more accurate (which was expected) and is also 
more robust to noise which is an interesting, not necessarily 
anticipated, outcome. 

Our next example is based on a contribution from Mestre 
|(2) which initiated a series of statistical inference methods 
exploiting complex integration, among which the source de- 
tection methods described above. 



B. G-MUSIC 

The MUSIC method, originally due to Schmidt and later 
extended by McCloud and Scharf (55) , (56) , is a direction 
of arrival detection technique based on a subspace approach. 
The scenario consists of an array of N sensors (e.g. a radar 
antenna) receiving the signals originating from K sources at 
angles 61,. . . ,9k with respect to the array. The objective is 
to estimate those angles. The data vector y(t) received at time 



t by the array can be written as 

K 

y(t) = J2 s (°k)x k (t) + aw(t) 

k=l 

where s(6) G C N is a deterministic vector-valued function 
of the angle 9 G [0, 2n), Xk(t) G C is the proper complex 
Gaussian i.i.d. data sent by source k at time t and w(i) G 
is the additive proper complex Gaussian noise. The vector y(t) 
is Gaussian with covariance 

T = S(9)S(9) H +a 2 I N 

where S(6) = [s(6>i), . . . , s(6 K )\ € C WxA '. We denote as 
usual Y = [y(l), . . . , y(n)] G C Nxn the concatenated matrix 
of n independent observations. 

We call uji < ... < u>n the eigenvalues of T and 
Ui,...,ujv their associated eigenvectors. Similarly, we will 
denote Ai < ... < Ajv the eigenvalues of T = ^YY H , 
with respective eigenvectors u 1; . . . , u^r- Assuming N > K, 
the smallest N — K eigenvalues of T equal a 2 and we can 
represent T under the form 

T-(U„ U.)(-V ^(u!) 

U s = diag(wiv-if+i,---,Wiv), U s = [u N -k+i, ■ ■ ■ > u N ] 
the so-called signal space and JJw = [ui, . . . , Ujv_ij-] the 
so-called noise space. 

The basic idea of the MUSIC method is to observe that 
any vector lying in the signal space is orthogonal to the noise 
space. This leads in particular to 

V (6 k ) 4 s((? fc ) H U w UtVs((? fe ) = 

for k G {1, . . . , K}. 

A natural estimator 9k of 9k in the neighborhood of 9k 
consists in the minimal argument of 

fi(9) 4 s (9) H \J w i]^s(9) 

where TJw = [ui, . . . , u.n-k] is the eigenvector space corre- 
sponding to the smallest N — K eigenvalues of T. 

This estimator is however proven inconsistent for N, n 
growing large simultaneously in (57) , which is now not 
surprising. To produce an (N, n) -consistent estimator, recall 
( p~2| ). Similar to the derivation of the eigenvector projection in 
Section |III-F| we can write 

s(0) H U w U^s(0) = s(f?) H (T - zl N )- 1 s{9)dz 

for a contour Cfc circling around a 2 only. Connecting T to T 
from a theorem similar to Theorem [T] and performing residue 
calculus, we finally obtain a good approximation of rj(9), and 
then an (TV, n)-consistent estimator for the direction of arrival. 
This estimator consists precisely in determining the K deepest 
minima (zeros may not exist) of the function (2) 
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Fig. 7. MUSIC against G-MUSIC for DoA detection of K = 2 signal 
sources, N = 20 sensors, n = 150 samples, SNR of 10 dB. Angles of 
arrival of 35° and 37°. 



with 



(i) defined as 
1 

-E 



A,-A fc 



K 



Ai-A* 



A; — Mfc 



Ai — M/c 

, i> N -K 



i< N -K 



and where, similar to above, [i\ < ... < /xjv are the 
eigenvalues of diag(A)— iyAvA , with A = (Ai, . . . , \n) T , 
Ai < . . . < Aat the ordered eigenvalues of T. 

Observe that, contrary to the classical MUSIC method, not 
only the noise subspace but all the eigenvectors of T are used, 
although they are divided in two subsets with different weight 
policies applied to the eigenvectors in each set. This result is 
generalized to the scenario where Xk(t) is non random. In this 
case, the technique involves different reference results than the 
sample covariance matrix theorems required here but the final 
estimator is obtained similarly [58]. 

In Figure [7] the performance comparison between the tradi- 
tional MUSIC and the G-MUSIC algorithms is depicted, when 
two sources have close angles which the traditional MUSIC 
algorithm is not able to distinguish. It is clear on this single- 
shot realization that the G-MUSIC algorithm shows deeper 
minima and allows for a better source resolution. 

Note that both inference methods described above, be it 
the power or angle of arrival estimation schemes, assume a 
priori knowledge of the cluster indexes to be able to implement 
the estimators. The proposed random matrix method therefore 
does not solve the order selection problem. New techniques are 
therefore being investigated that deal with the generalization of 
the Akaike principle J59| and the minimum description length 
technique |60| for improved order selection using random 
matrix theory, see e.g. JoT) . 

Remark also that, so far, only regular sample covariance 
matrix models have been analyzed to address questions of 
array processing. In reality, due (in particular) to the non- 
Gaussian noise structure in radar detection, more advanced 
estimators of the population covariance matrix are used based 
on the celebrated Huber and Maronna robust M-estimators 



| |63) . We are currently investigating these M-estimators 
within the random matrix framework. Further intricate models 
involving multi-path propagation and the technique consisting 
in stacking successive vector observations are also under study. 

This completes the set of examples using results based on 
weak limits of large dimensional random matrices and G- 
estimation. We now turn to examples involving the results 
for the spike models, starting with simple signal detection 
procedures. 

C. Detection 

Consider the hypothesis test which consists in deciding 
whether a received signal y(t) G C N consists of pure noise, 
y(t) — erw(t), with w(t) € with Gaussian entries with 
zero mean and unit variance, for some unknown a (hypothesis 
!Ko), or containing a signal plus noise, y(t) = hx(t) +crw(t), 
for a time-independent vector channel h £ C N , a scalar proper 
complex Gaussian signal x(t) (hypothesis and a noise 
variance a 2 , which are all unknown. As usual, we gather n 
i.i.d. received data in Y = [y(l), . . . , y(n>)]. 

Since h is unknown, we cannot compare directly 3~Cq and 
J£i. Instead, we will accept or reject "Kq based on how 
Y fits the pure noise hypothesis. It is a known result that 
the generalized likelihood ratio test (GLRT) for the decision 
between %q and "K\ boils down to the test [64] 

Ai = , i t /(*) 



•J- tr YY h ) 

Nn 1 



Ml 
1 - 



with Ai the largest eigenvalue of ^YY , for / a given 
monotonic function and e the maximally acceptable false 
alarm rate. Evaluating the statistical properties of A^ for finite 
N is however rather involved and leads to impractical tests, 
see e.g. fPJ) . Instead, we consider here a very elementary 
statistical test based on Theorem 

It is clear that -^trYY H a 2 under both J£ or 

J£i. Therefore, an application of Slutsky's lemma [1| ensures 
that the asymptotic fluctuations of A^ follow a Tracy-Widom 
distribution around (1+yjV/n) 2 . The GLRT method therefore 
leads asymptotically to test the Tracy-Widom statistics for 
the appropriately centered and scaled version of A' r More 
precisely, for a false alarm rate e, this is 



(1 + V~c) 2 



(1 + ^0)3^ Mi 

with c = N/n and T2 the Tracy-Widom distribution. Further 
properties of the above statistical test, and in particular the- 
oretical expression of false alarm rates and test powers are 
derived in J65) using the theory of large deviations. Due to 
the relatively slow convergence speed of the largest eigenvalue 
distribution to the Tracy-Widom law, some recent works have 
proposed refinements, see e.g. |66|. 

In Figure [8] the performance of the GLRT detector given by 
the receiver operating curve for different false alarm rate levels 
is depicted, for small system sizes. We present a comparison 
between the GLRT approach, the empirical technique |67[ 
which consists in using the conditioning number of ^YY H 
(ratio largest and the smallest eigenvalues) as a signal detector 
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Fig. 8. Receiver operating curve for a priori unknown <r 2 of the Neyman- 
Pearson test (N-P), condition number method and GLRT, N = 4, n = 8, 
SNR= dB, h has Gaussian entries of zero mean and unit variance. For the 
Neyman-Pearson test, both uniform and Jeffreys prior, with exponent = 1, 
are provided. 

and the optimal Neyman-Pearson detector (with knowledge 
of the channel statistics) derived in [68 1. It turns out that 
the suboptimal asymptotic GLRT approach is quite close in 
performance to the finite dimensional exact optimum, the latter 
being however quite complex to implement. 

Our last application uses the second order statistics of 
eigenvalues and eigenvectors for a spike model in the context 
of failure localization in large dimensional networks. 

D. Failure detection in large networks 

The method presented here allows for fast detection and 
localization of local failures (few links or few nodes) in a 
large sensor network of N sensors gathering data about M 
system parameters 9\, . . . , 9m- Assume the linear scenario 

x(t) = H0(t) + o-w(t) 

for 9{t) = [9 1 {t),...,9 M {t)] T € C M and w(t) € C N two 
complex Gaussian signals with independent entries of zero 
mean and unit variance. The objective here is to detect a small 
rank perturbation of E[x(t)x(i) H ] = T = HH H + a 2 I N , 
originating from a local failure or change in the parameters, 
assuming T known. In particular, when the entry k of 9{t) 
shows a sudden change in variance, the model changes as 

x'(t) = H(I M + a k e k e^)9{t) + aw(t) 

for a given ctf. > —1 and with € C M the vector of all zeros 
but for efe(fc) = 1. Taking for instance a k — —1 turns the entry 
k of 9(t) into zero, corresponding to a complete collapse of 
the parameter under control. Denoting y(£) = T~2x(i) and 
y'(£) = T-3x'(i), we have E[y(f)y(t) H ] = I N , while 

E[y'(t)y' H (t)] = In + [(1 + a k f - l]T _ 5Hej : e^H H T _ i 

which is a rank-1 perturbation of Ijy. 

Now assume that we do not know whether the model 
follows the expression of y(t) or y'(i), and let us genetically 



denote both by y(t). A simple off-line failure (or change) 
detection test consists, as in Section [IV-C| to decide whether 
the largest eigenvalue of iYY H , with Y = [y(l), . . . , y(n)}, 
has a Tracy-Widom distribution. However, we wish now to 
go further and to be able to decide, upon failure detection, 
which entry 9 k of 9 was altered. For this, we need to provide 
a statistical test for the hypothesis 3£ k ^"parameter 9 k failed". 
A mere maximum likelihood procedure consisting in testing 
the Gaussian distribution of Y, assuming a k known for each 
k, is however costly for N large and becomes impractical if a k 
is unknown. Instead, one can perform a statistical test on the 
extreme eigenvalues and eigenvector projections of ^YY H , 
which mainly costs the computational time of eigenvalue 
decomposition (already performed in the detection test). For 
this, we need to assume that the number n of observations is 
sufficiently large for a failure of any parameter of 9 to be de- 
tectable. As an immediate application of Theorem [5] denoting 
A the largest eigenvalue of ^YY H and u its corresponding 
eigenvector, the estimator k for k is then given by [44 1 

fc = arg max - N ( ^ ' E - ~ ^ 

i<i<M \ A — Pi J \ X — pi J 

— log det Sj 

with u,; such that ||uj|| = 1, cjiU^u^ = [(1 + on) 2 — 
l]T"2He i e^H H T"5, and p u S l defined as a function 
of u)i in Theorem [5] (the index i refers here to a change in 
parameter 9i and not to the i-th largest eigenvalue of the small 
rank perturbation matrix). 

This procedure still assumes that ati is known for each i. If 
not, an estimator of a, can be derived and Theorem [5] adapted 
accordingly to account for the fluctuations of the estimator. 

In Figure |9| we depict the detection and localization per- 
formance for a sudden drop to zero of the parameter #i(t) 
in a scenario with M = N = 10. We see in Figure [9] 
that, compared to the detection performance, the localization 
performance is rather poor for small n but reaches the same 
level as the detection performance for larger n, which is 
explained by the inaccuracy of the Gaussian approximation 
for uii close to s/c. 

V. Conclusions 

In this short tutorial about statistical inference using large 
dimensional random matrix theory, we have argued that many 
traditional signal processing methods are inconsistent when 
both the population and system dimensions are large. We then 
introduced notions of random matrix theory which provide 
results on the spectrum of large random matrices. These results 
were then used to adjust some of these inconsistent signal 
processing methods to new consistent estimates. To this end, 
we presented a recent method based (i) on the Stieltjes trans- 
form to derive weak convergence properties of the spectrum 
of large matrices and (ii) on complex integration to derive 
estimators. This somewhat parallels the Fourier transform and 
M-estimator framework usually met in classical asymptotic 
signal processing QJ. 

Nonetheless, while classical signal processing tools are 
very mature, statistical inference based on random matrix 
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Fig. 9. Correct detection (CDR) and localization (CLR) rates for different 
levels of false alarm rates (FAR) and different values of n, for drop of variance 
of 9\. The minimal theoretical n for observability is n = 8. 

analysis still lacks many fundamental mathematical results that 
only experts can provide to this day for advanced system 
models. The main consequence is the slow appearance of 
methods to derive various estimators for more involved signal 
processing problems. The emergence of the aforementioned 
Stieltjes transform and complex integration framework is a 
rare exception to this rule which, we believe, is the foundation 
for many future breakthroughs in this area. 
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